EDS: An Efficient Data Selection policy for search engine storage architectures
نویسندگان
چکیده
Caching is an effective optimization in search engine storage architectures. Many caching algorithms have been proposed to improve retrieval performance. The data selection policy of search engine cache management plays an important role, which carefully places the data in memory or other storage, such as solid state disks (SSDs). Considering that the historical query log has a guiding role for the future query, we present an Efficient Data Selection (EDS) policy for search engine cache management, which views cache media as a knapsack, and views results and posting lists as items. The best benefit of EDS can be computed by greedy algorithms. We carry out a series of experiments to study the essential factors of the data selection in different architectures, including hard disk drive (HDD), SSD, and SSD-based hybrid storage architectures. The hybrid storage architecture is a two-level cache architecture, which uses SSD as a secondary cache for the memory. Our main goal is to improve the performance of the search engines and reduce the cost of the servers on two-level cache architecture. The experimental results demonstrate that our proposed policy improves the hit ratio by 20.04% as well as the retrieval performance on HDD, SSD, and hybrid architecture ∗Corresponding author Preprint submitted to Future Generation Computer Systems January 24, 2016 *Manuscript Click here to view linked References
منابع مشابه
A Cache Design of SSD-based Search Engine Architectures: An Experimental Study
Caching is an important optimization in search engine architectures. Existing caching techniques for search engine optimization are mostly biased towards the reduction of random accesses to disks, because random accesses are known to be much more expensive than sequential accesses in traditional magnetic hard disk drive (HDD). Recently, solid state drive (SSD) has emerged as a new kind of secon...
متن کاملAn Alternate Idea for Storage Optimization in Search Engine
We propose an alternate indexing storage technique of Search Engine. In this approach, we achieve reduced space complexity. We try to decrease time complexity for faster data retrieval and decrease storage space for efficient utilization of space. This paper provides an algorithm of indexing mechanism by which effective storage space is reduced. Space complexity of a Search Engine depends on th...
متن کاملI Inverted Index Compression
The data structure at the core of nowadays largescale search engines, social networks, and storage architectures is the inverted index. Given a collection of documents, consider for each distinct term t appearing in the collection the integer sequence `t , listing in sorted order all the identifiers of the documents (docIDs in the following) in which the term appears. The sequence `t is called ...
متن کاملReview of ranked-based and unranked-based metrics for determining the effectiveness of search engines
Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...
متن کاملInverted Index Compression
The data structure at the core of nowadays large-scale search engines, social networks and storage architectures is the inverted index, which can be regarded as being a collection of sorted integer sequences called inverted lists. Because of the many documents indexed by search engines and stringent performance requirements dictated by the heavy load of user queries, the inverted lists often st...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Future Generation Comp. Syst.
دوره 74 شماره
صفحات -
تاریخ انتشار 2017